AITopics | continuous action space

Collaborating Authors

continuous action space

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Actor-Free Continuous Control via Structurally Maximizable Q-Functions

Neural Information Processing SystemsJun-16-2026, 01:09:23 GMT

Value-based algorithms are a cornerstone of off-policy reinforcement learning due to their simplicity and training stability. However, their use has traditionally been restricted to discrete action spaces, as they rely on estimating Q-values for individual state-action pairs. In continuous action spaces, evaluating the Q-value over the entire action space becomes computationally infeasible. To address this, actor-critic methods are typically employed, where a critic is trained on off-policy data to estimate Q-values, and an actor is trained to maximize the critic's output. Despite their popularity, these methods often suffer from instability during training. In this work, we propose a purely value-based framework for continuous control that revisits structural maximization of Q-functions, introducing a set of key architectural and algorithmic choices to enable efficient and stable learning. We evaluate the proposed actor-free Q-learning approach on a range of standard simulation tasks, demonstrating performance and sample-efficiency on par with state-of-the-art baselines, without the cost of learning a separate actor. Particularly, in environments with constrained action spaces, where the value functions are typically non-smooth, our method with structural maximization outperforms traditional actor-critic methods with gradient-based maximization. We have released our code at https://github.com/USC-Lira/Q3C.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (0.67)
Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Maximum Causal Tsallis Entropy Imitation Learning

Kyungjae Lee, Sungjoon Choi, Songhwai Oh

Neural Information Processing SystemsMar-14-2026, 19:24:45 GMT

In this paper, we propose a novel maximum causal Tsallis entropy (MCTE) framework for imitation learning which can efficiently learn a sparse multi-modal policy distribution from demonstrations. We provide the full mathematical analysis of the proposed framework. First, the optimal solution of an MCTE problem is shown to be a sparsemax distribution, whose supporting set can be adjusted. The proposed method has advantages over a softmax distribution in that it can exclude unnecessary actions by assigning zero probability. Second, we prove that an MCTE problem is equivalent to robust Bayes estimation in the sense of the Brier score. Third, we propose a maximum causal Tsallis entropy imitation learning (MCTEIL) algorithm with a sparse mixture density network (sparse MDN) by modeling mixture weights using a sparsemax distribution. In particular, we show that the causal Tsallis entropy of an MDN encourages exploration and efficient mixture utilization while Shannon entropy is less effective.

artificial intelligence, demonstration, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)

Add feedback

A Detailed Proof 1 A.1 Proof of Theorem 4.1

Neural Information Processing SystemsFeb-17-2026, 23:20:32 GMT

We can compute the fixed point of the recursion in Equation A.2 and get the following estimated Then we compare these two gaps. To utilize the Eq. 4 for policy optimization, following the analysis in the Section 3.2 in Kumar et al. By choosing different regularizer, there are a variety of instances within CQL family. B.36 called CFCQL( H) which is the update rule we used: In discrete action space, we train a three-level MLP network with MLE loss. In continuous action space, we use the method of explicit estimation of behavior density in Wu et al.

artificial intelligence, cql, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning Jianzhun Shao, Y un Qu

Neural Information Processing SystemsFeb-17-2026, 23:20:28 GMT

MARL in real scenarios is still challenging due to the same safety and efficiency concerns in single-agent setting, then it is worth conducting investigation for offline RL in multi-agent setting.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Maximum Causal Tsallis Entropy Imitation Learning

Kyungjae Lee, Sungjoon Choi, Songhwai Oh

Neural Information Processing SystemsFeb-12-2026, 10:58:36 GMT

Neural Information Processing Systems http://nips.cc/

demonstration, entropy, imitation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Add feedback

Semi-Parametric Efficient Policy Learning with Continuous Actions

Victor Chernozhukov, Mert Demirer, Greg Lewis, Vasilis Syrgkanis

Neural Information Processing SystemsFeb-11-2026, 09:55:04 GMT

Neural Information Processing Systems http://nips.cc/

assumption, value function, variance, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

d5eca8dc3820cad9fe56a3bafda65ca1-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 08:56:06 GMT

We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero. Our method achieves 190.4% mean human performance and 116.0%

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

In this section, we present detailed proofs for the theoretical derivation of Thm. 1, which aims to solvethefollowingoptimizationproblem: min

Neural Information Processing SystemsFeb-9-2026, 23:15:29 GMT

These assumptions are not strong and can be satisfied in most of environments includes MuJoCo, Atarigamesandsoon. Let f be an Lebesgue integrable function, P and Q are two probability distributions, |f| C,then EP(x)f(x) EQ(x)f(x) CDTV(P,Q) (5) Proof. Suppose there are two actions a1, a2 under state s, and let Q1(s,a1) = u, Q1(s,a2) = v. In this way, we can derive the upper bound of Ea π2Q1(s,a) Ea π1Q1(s,a)asabove. Since both sides of the above equation have the same minimum (here the minima are given by Qk = Q), we can replace the objective in Problem 2 with the upper bound in Eq. (10) and solve therelaxedoptimizationproblem.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: